-
Notifications
You must be signed in to change notification settings - Fork 201
[Storage] Refactor storage operations with functors #8026
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
state/protocol/badger/mutator.go
Outdated
| } | ||
| deferredBlockPersist.AddNextOperation(func(lctx lockctx.Proof, blockID flow.Identifier, rw storage.ReaderBatchWriter) error { | ||
| return operation.IndexLatestSealAtBlock(lctx, rw.Writer(), blockID, latestSeal.ID()) | ||
| return operation.IndexingLatestSealAtBlock(blockID, latestSeal.ID())(lctx, rw) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We mentioned that any database operation requiring a lock context could be refactored using the functor pattern, but I think this case might be an exception—even after applying the refactor.
The functor isn’t particularly useful here since we don’t have the block ID until we start executing the deferred database operations.
I went ahead and refactored it anyway to illustrate my point, but in this case, it doesn’t provide any performance benefits over the original version and only adds unnecessary complexity.
Thoughts?
| } | ||
|
|
||
| // make sure all payload guarantees are stored | ||
| for _, guarantee := range payload.Guarantees { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of storing each individual guarantee, we pass in all the guarantees to be stored together, so that instead of creating N functors, we create only one functor to insert and index N guarantees.
| return fmt.Errorf("could not store guarantees: %w", err) | ||
| } | ||
|
|
||
| // make sure all payload seals are stored |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this PR, I'm using guarantees as an example to apply the functor pattern and raise the discussion. I'd like to get feedbacks first and settle on the functor pattern before applying this to the storing operation of the rest of payloads , such as seals, results etc.
| )) | ||
| } | ||
|
|
||
| type CollectionGuaranteeWithID struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why creating this struct?
Because we would like to insert N guarantees with one functor, instead of N functors. And we need to bundle all the data before passing to the functor.
Why placing this struct here?
It might not be the best place, but at least it's close to where it is being used (InsertAndIndexGuarantees). I'm open for ideas for a better place for this struct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because we would like to insert N guarantees with one functor, instead of N functors.
Do we have any evidence that using N functors would have a non-negligible performance impact? I think it is likely to have an extremely small impact in practice, and don't think it is worth the complexity of optimizing for here.
Also, the need to pre-compute the guarantee IDs prior to passing to the functor is downstream of a lack of ID caching. General ID caching is a problem we will probably want to solve relatively soon, which would remove the need for this (we would just pass the Guarantee object around and call ID() as much as we want without per-call performance penalty). Again, unless there is evidence that we really need the optimization for this case, I suggest we opt for the simpler implementation.
| return nil | ||
| } | ||
| errmsg := fmt.Sprintf("InsertAndIndexResultApproval failed with approvalID %v, chunkIndex %v, resultID %v", | ||
| approvalID, chunkIndex, resultID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding the err message, the functors like Overwriting and InsertingWIthMismatchCheck is too general that doesn't have the context. So I used WrapError to include more context.
| storing := operation.InsertAndIndexResultApproval(approval) | ||
|
|
||
| return func(lctx lockctx.Proof) error { | ||
| if !lctx.HoldsLock(storage.LockIndexResultApproval) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is redundant, because InsertAndIndexResultApproval will check the lock
|
|
||
| // Upserting returns a functor, whose execution will append the given key-value-pair to the provided | ||
| // storage writer (typically a pending batch of database writes). | ||
| func Upserting(key []byte, val interface{}) func(storage.Writer) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replaced by functors
| return WrapError("InsertAndIndexGuarantees failed", BindFunctors( | ||
| HoldingLock(storage.LockInsertBlock), | ||
| WrapError("insert guarantee failed", OverwritingMul(guaranteeIDKeys, guarantees)), | ||
| WrapError("index guarantee failed", InsertingMulWithMismatchCheck(collectionIDKeys, guaranteeIDs)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wrapping error for inserting multiple records is a bit challenging, I can't find a cleaner way to provide context for each individual write. So ended up just wrapping the process.
| // Caller must ensure guaranteeID equals to guarantee.ID() | ||
| // Caller must acquire the [storage.LockInsertBlock] lock | ||
| // It returns [storage.ErrDataMismatch] if a different guarantee is already indexed for the collection | ||
| func InsertAndIndexGuarantee(guaranteeID flow.Identifier, guarantee *flow.CollectionGuarantee) Functor { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is unused, but only for reference for now.
I initially created this function by combining InsertGuarantee and IndexGuarantee, and refactored with the functor pattern. But later I decided to use InsertAndIndexGuarantees, see comments there for my motivation.
I kept this one here for reference as it's easier to understand, will clean up after the discussion of functor pattern is settled.
storage/operation/functor.go
Outdated
| // Overwriting returns a functor that overwrites a key-value pair in the storage. | ||
| // The value is serialized using msgpack encoding. If the key already exists, | ||
| // the value will be overwritten without any checks. | ||
| // | ||
| // This is typically used for operations where we want to update existing data | ||
| // or where we don't care about potential conflicts. | ||
| func Overwriting(key []byte, val interface{}) Functor { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| // Overwriting returns a functor that overwrites a key-value pair in the storage. | |
| // The value is serialized using msgpack encoding. If the key already exists, | |
| // the value will be overwritten without any checks. | |
| // | |
| // This is typically used for operations where we want to update existing data | |
| // or where we don't care about potential conflicts. | |
| func Overwriting(key []byte, val interface{}) Functor { | |
| // Upserting returns a functor that overwrites a key-value pair in the storage. | |
| // The value is serialized using msgpack encoding. If the key already exists, | |
| // the value will be overwritten without any checks. | |
| // | |
| // This is typically used for operations where we want to update existing data | |
| // or where we don't care about potential conflicts. | |
| func Upserting(key []byte, val interface{}) Functor { |
I prefer the term "upsert": it matches the non-functor function with the same semantics, and better captures that we are either inserting or updating (we are only overwriting if something has already been written).
| // | ||
| // This is used for operations where we want to ensure uniqueness and prevent | ||
| // accidental overwrites of existing data. | ||
| func InsertingWithExistenceCheck(key []byte, val interface{}) Functor { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think separating this out from lock proof checking is a good idea. This function will not behave as expected if the user does not include lock checking separately. Currently there is not documentation of this critical requirement of the user of this function. We could add documentation, but I think it would be better to logically couple lock proof checking with corresponding read/write checks.
I like the idea, but I thinking adding too many layers of indirection will make it harder to write and verify correct use of locks. I think it would be better to couple each of these generic lock-requiring utility functions with HoldingLock.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are lots of boilerplates, like the error to return, it allows us to work with higher abstraction.
We could also make a InsertingWithExistenceCheckAndLockCheck to bind with the lock checking, but IMO it's unnecessary.
So we are able to simplify the functor operation as:
func InsertAndIndexGuarantees(guaranteesWithID []*CollectionGuaranteeWithID) Functor {
...
return BindFunctors(
CheckHoldLockFunctor(storage.LockInsertBlock),
UpsertMulFunctor(guaranteeIDKeys, guarantees),
InsertFunctorMulWithMismatchCheck(collectionIDKeys, guaranteeIDs),
)
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could also make a InsertingWithExistenceCheckAndLockCheck to bind with the lock checking
I don't think we should have a separate function that also binds the lock check. I think InsertingWithExistenceCheck should remain as it is, except that it also does the lock check. The reason is that InsertingWithExistenceCheck must never be called on its own, it always must be called strictly after a lock check.
Currently these two things which must always happen together are separate in the API, and the fact that they must be manually called together every time for correct usage is not documented. I'm suggesting that we modify the API so that the only way to call InsertingWithExistenceCheck is to simultaneously define the required lock check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about keeping both:
func InsertAndIndexGuarantees(guaranteesWithID []*CollectionGuaranteeWithID) Functor {
...
return BindFunctors(
+ // keep this lock check here to make the functor composition flexible
CheckHoldLockFunctor(storage.LockInsertBlock),
UpsertMulFunctor(guaranteeIDKeys, guarantees),
- InsertFunctorMulWithMismatchCheck(collectionIDKeys, guaranteeIDs),
+ InsertFunctorMulWithMismatchCheck(
+ storage.LockInsertBlock, // check if this lock is held before running the insert
+ collectionIDKeys, guaranteeIDs),
)
}| // | ||
| // This is used for operations where we want to ensure data consistency and | ||
| // detect potential race conditions or conflicting updates. | ||
| func InsertingWithMismatchCheck(key []byte, val interface{}) Functor { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have the same worry here about separating generic logic that must be lock-protected from the logic checking the lock, as mentioned above.
| )) | ||
| } | ||
|
|
||
| type CollectionGuaranteeWithID struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because we would like to insert N guarantees with one functor, instead of N functors.
Do we have any evidence that using N functors would have a non-negligible performance impact? I think it is likely to have an extremely small impact in practice, and don't think it is worth the complexity of optimizing for here.
Also, the need to pre-compute the guarantee IDs prior to passing to the functor is downstream of a lack of ID caching. General ID caching is a problem we will probably want to solve relatively soon, which would remove the need for this (we would just pass the Guarantee object around and call ID() as much as we want without per-call performance penalty). Again, unless there is evidence that we really need the optimization for this case, I suggest we opt for the simpler implementation.
storage/operation/payload.go
Outdated
| return UpsertByKey(w, MakePrefix(codeBlockIDToLatestSealID, blockID), sealID) | ||
| } | ||
|
|
||
| func IndexingLatestSealAtBlock(blockID flow.Identifier, sealID flow.Identifier) Functor { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
General suggestion on naming
I think explicitly using the Functor terminology in the name of functor-returning storage functions would be clearer than using the present tense.
- Present tense vs imperative doesn't really communicate "this will be done in the future" to me
- The "functor" version is usually named very similarly to the "non-functor" version (eg.
IndexvsIndexing) -- I feel it would be easy to gloss over and miss the distinction
I'm suggestion as a naming convention: $(name you would have used had it not been a functor) + "Functor"
| func IndexingLatestSealAtBlock(blockID flow.Identifier, sealID flow.Identifier) Functor { | |
| func IndexLatestSealAtBlockFunctor(blockID flow.Identifier, sealID flow.Identifier) Functor { |
Co-authored-by: Jordan Schalm <[email protected]>
Dependency Review✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.Scanned FilesNone |
Working towards #7910